Cross-entropy vs. squared error training: a theoretical and experimental comparison

نویسندگان

  • Pavel Golik
  • Patrick Doetsch
  • Hermann Ney
چکیده

In this paper we investigate the error criteria that are optimized during the training of artificial neural networks (ANN). We compare the bounds of the squared error (SE) and the crossentropy (CE) criteria being the most popular choices in stateof-the art implementations. The evaluation is performed on automatic speech recognition (ASR) and handwriting recognition (HWR) tasks using a hybrid HMM-ANN model. We find that with randomly initialized weights, the squared error based ANN does not converge to a good local optimum. However, with a good initialization by pre-training, the word error rate of our best CE trained system could be reduced from 30.9% to 30.5% on the ASR, and from 22.7% to 21.9% on the HWR task by performing a few additional “fine-tuning” iterations with the SE criterion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Entropy and Mean Square Error Criteria in Adaptive System Training Using Higher Order Statistics

The error-entropy-minimization approach in adaptive system training is addressed in this paper. The effect of Parzen windowing on the location of the global minimum of entropy has been investigated. An analytical proof that shows the global minimum of the entropy is a local minimum, possibly the global minimum, of the nonparametrically estimated entropy using Parzen windowing with Gaussian kern...

متن کامل

Discriminative Training of the Scanning N-Tuple Classifier

The Scanning N-Tuple classifier (SNT) was introduced by Lucas and Amiri [1, 2] as an efficient and accurate classifier for chaincoded hand-written digits. The SNT operates as speeds of tens of thousands of sequences per second, during both the training and the recognition phases. The main contribution of this paper is to present a new discriminative training rule for the SNT. Two versions of th...

متن کامل

Weighted Heterogeneous Learning for Deep Convolutional Neural Network Based Facial Image Analysis

Recognition of facial attributes such as facial point, gender, and age has been used in marketing strategies and social networking services. Marketing strategies recommend the goods, that are supposed to matches the needs of potential clients. Various social networking services based on facial recognition techniques have recently been developed that can estimate age from a facial image with a h...

متن کامل

Convergence properties and data efficiency of the minimum error entropy criterion in ADALINE training

Recently, we have proposed the minimum error entropy (MEE) criterion as an information theoretic alternative to the widely used mean square error criterion in supervised adaptive system training. For this purpose, we have formulated a nonparametric estimator for Renyi’s entropy that employs Parzen windowing. Mathematical investigation of the proposed entropy estimator revealed interesting insig...

متن کامل

On the Estimation of Shannon Entropy

Shannon entropy is increasingly used in many applications. In this article, an estimator of the entropy of a continuous random variable is proposed. Consistency and scale invariance of variance and mean squared error of the proposed estimator is proved and then comparisons are made with Vasicek's (1976), van Es (1992), Ebrahimi et al. (1994) and Correa (1995) entropy estimators. A simulation st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013